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Abstract 

We present a polynomial hybrid Monte Carlo (PHMC) algorithm for lattice QCD with odd 
numbers of flavors of 0(a)-improved Wilson quark action. The algorithm makes use of the non- 
Hermitian Chebyshev polynomial to approximate the inverse square root of the fermion matrix 
required for an odd number of flavors. The systematic error from the polynomial approximation 
is removed by a noisy Metropolis test for which a new method is developed. Investigating the 
property of our PHMC algorithm in the Nf = 2 QCD case, we hnd that it is as efficient as the 
conventional HMC algorithm for a moderately large lattice size (16^ x 48) with intermediate quark 
masses {mps/my ~ 0.7-0.8). We test our odd-flavor algorithm through extensive simulations of 
two-flavor QCD treated as an Nf = 1 -|- 1 system, and comparing the results with those of the 
established algorithms for Nf = 2 QCD. These tests establish that our PHMC algorithm works on a 
moderately large lattice size with intermediate quark masses (16^ xA8,mps/my ~ 0.7-0.8). Finally 
we experiment with the (2 -|- l)-flavor QCD simulation on small lattices (4^ x 8 and 8^ x 16), and 
confirm the agreement of our results with those obtained with the R algorithm and extrapolated 
to a zero molecular dynamics step size. 

PACS numbers: 12.38.Gc, 02.60.Cb, 02.60.Gf, 75.40.Mg 


*Present address: Department of Physics, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, 
Japan 
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I. INTRODUCTION 


An essential step toward realistic lattice simulations of quantum chromodynamics (QCD) 
is to develop efficient algorithms to incorporate the dynamical sea quark effects of up, down, 
and strange quarks. Most of the recent dynamical QCD simulations have been, however, 
limited to two-flavor QCD where up and down quarks are treated dynamically while the 
loop effect of the strange quark is still neglected. This is mainly due to the lack of efficient 
algorithms to treat an odd number of dynamical quark flavors. The R algorithm |]^ is a 
possible candidate for this purpose, but its serious drawback is the systematic error of 0{dR) 
stemming from a finite step size dt in the molecular dynamics evolution. To control this 
systematic error, one has to keep dt small enough and to monitor the size of the error by 
performing simulations at various values of dt, which requires much computational effort. 
Therefore, an exact algorithm such as the hybrid Monte Carlo (HMC) algorithm |^, which 
is widely used for simulations with an even number of flavors, is clearly desirable. 

Recently, Takaishi and de Forcrand proposed an algorithm for an odd number of dynam¬ 
ical flavors P|. They use the polynomial hybrid Monte Carlo (PHMC) algorithm i B i 
with a non-Hermitian Chebyshev polynomial, with which one can approximate the inverse 
square root of the fermion matrix needed for the simulation of an odd number of flavors § . 
They introduced a method to calculate the correction factor required to compensate for the 
truncation error due to the finite order of the polynomial, and hence the algorithm is exact. 
The algorithm was tested on a small lattice for 1, 1-1-1, and 2-1-1 flavors of Wilson fermions. 

Clearly, the next step toward realistic simulations of QCD is to investigate the practical 
feasibility of their algorithm for two light (up and down) quarks and one relatively heavy 
(strange) quark on large physical volumes. In this case, up and down quarks are treated 
with the usual pseudofermion method, while the strange quark is incorporated with the 
polynomial approximation. It is known that the multiboson algorithms, which also rely on 
the polynomial approximation for the inverse of fermion matrix, fail for light quarks 0. 
Therefore, we need to examine whether the algorithm with the polynomial approximation 
works for intermediate quark masses (around the strange quark). An implementation of the 
algorithm for the 0(a)-improved Wilson (clover) quark action is also important to carry 
out simulations with reduced systematic errors due to finite lattice spacing. 

In this work we present a modified algorithm for (2 -|- l)-flavor QCD with the 0(a)- 
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improved Wilson quark action. Our algorithm is a variant of PHMC with the non-Hermitian 
Chebyshev polynomial as that of Takaishi and de Forcrand |^, while the treatment of the 
correction factor is different. We test our algorithm for two different systems. One is two- 
flavor QCD treated as a system with 1-1-1 flavors, and the simulation results are compared 
with those of the conventional HMC for two flavors. The other is (2 -|- l)-flavor QCD, where 
our algorithm is compared with the R algorithm [Q] after extrapolating to zero step size 
dt 0. 


We also perform two systematic numerical tests of the HMC and PHMC algorithms in 
two-flavor QCD in order to provide a basis to hnd the best method and parameter choices 
for an extension to realistic simulation with 2-1-1 flavors. 

As a hrst step, we test the even-odd preconditioning for the 0(a)-improved Wilson 


fermion action, which was hrst proposed by Luo and Jansen and Liu [|I^. They in¬ 
troduced symmetrical and asymmetrical preconditioning, and mainly considered the asym¬ 
metric version. In our practical tests we found signihcant improvement for both versions 
over the simulation without preconditioning. The improvement is more pronounced for the 
symmetric case and the computer time can be reduced almost by a factor two from that 
without preconditioning. 

Second, we investigate the efficiency of the PHMC algorithm depending on the quark 
mass and on the degree of the polynomial. We found that the PHMC is as effective as the 
conventional HMC algorithm for two different quark masses corresponding tomps/my = 0.8 
and 0.7 on a reasonably large lattice. This observation is encouraging, as it suggests that 
the polynomial approximation is useful for future simulations of (2 -|- l)-havor QCD. 

The rest of the paper is organized as follows. In Sec. H we outline the algorithms we 
consider in this paper. The polynomial hybrid Monte Carlo (PHMC) algorithm and its 
generalization to an odd number of flavors is described. In Sec. IP we test the efficiency of 
the even-odd preconditioning for the 0(a)-improved Wilson fermion action using the usual 
HMC algorithm with two-flavor of quarks. We then investigate the efficiency of the PHMC 
algorithm for two-flavor QCD in Sec. Section ^ describes details of our algorithm for 
an odd number of flavors, and presents some numerical tests with which the consistency 
and the applicability is investigated. Our conclusion is given in Sec. 0. Our algorithm and 
simulation code have already been used for a study of the phase structure of three-flavor 
QCD with the Wilson-type fermion actions . 
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II. OUTLINE OF THE ALGORITHM 


We first present the outline of our algorithm for + A/ 2 )-flavor QCD, where Nf^ is an 
even number while Nf^ is odd. The details of the algorithm will be explained separately in 
later sections. 

In this section we consider the Wilson gauge and fermion actions, but the algorithm can 
be applied to more complicated lattice actions arising in the Symanzik improvement pro¬ 


which has a clover-leaf-type operator to remove the discretization error of 0{a). 


gram M|. In particular, the algorithm is suitable for the 0(a)-improved Wilson action TO 


A. Pseudofermion representation for even number of flavors 


Let Di and D 2 be the Dirac operators for two different fermion masses corresponding to 
Nf^ and Nf^ flavors, respectively. The partition function of this fermion system is given by 

Z = yT>f/(det[Di])^*(det[D2])^^2e-^s[^l, (1) 

where Sg[U] represents the gauge action. 


m 




-Nfj2 


( 2 ) 


Since Nf^ is an even number, the fermion determinant (det[Di])^* can be expressed i 
terms of the usual pseudofermion integral 

(det[Di])^^i = JV(j)\V(j)iexp 

where we have used the relation = 75 ZI 175 . We use a short-hand notation for the norm 
of a vector X as |Xp = J2n,a,a with n the site index, a the spinor index, and a the 

color index. 

In the usual HMC algorithm one uses some iterative solver to calculate the inverse of 
the fermion matrix Di. In the PHMC algorithm [^, ^, on the other hand, one introduces 
a polynomial PNpoiyi^] of order Npoiy that converges 1/z as Npoiy 00 . The non-Hermitian 
Chebyshev polynomial 

PnpJz] c,{l - z)\ (3) 

i=0 

with Ci = (—1)* is an example of such a polynomial, when |1 — ^| < 1. Supposing that all 
eigenvalues of Di fall inside the complex domain |1 — 2 ;| < 1 , we have 


(det[Di])^/i = 


det[Pv ,jT^i]] 
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det[DiP^ [Di]] 


iN 


fi 


\T>(j)i exp 




( 4 ) 


We notice that the inversion of the fermion matrix is replaced by a calculation of 

the polynomial 

Following the original proposal of the multiboson algorithm by Liischer JT^, Frezzotti and 
Jansen ii considered a Hermitian operator Q = cm 75 -Di with cm a normalization factor 
and used a polynomial approximation of det[Pi]^ = det[Q]^ rather than the non-Hermitian 
det[iJi], using the 75 Hermiticity property Pj = 75^175 of the Wilson-type lattice fermions. 
In this work, however, we consider the non-Hermitian relation Eq. (^), as it is suitable for 
the extension to an odd number of flavors. 

Since the polynomial approximation introduces a truncation error, one has to evaluate 


the correction factor 


det[F>iPvp„iJ^i]] 


n N 


fi 


in order to make the algorithm exact. As the 
correction factor is close to unity when the polynomial is a good approximation of the inverse, 
a stochastic technique can be used to incorporate the correction factor. The reweighting 
method and the global Metropolis test ^ have been proposed and used in the 
multiboson algorithm. For the PHMC algorithm, the reweighting method is applied in 
Refs. and the global Metropolis test in Ref. [^. We use the global Metropolis test 

developed for a multiboson algorithm 0, |^. The details of the global Metropolis test in 
the case of Nf^ = 2 will be given in Sec. [IV Q . 


B. Pseudofermion representation for an odd number of flavors 

For an odd number of flavors Nf^, we use the method developed by Alexandrou et al. 
to take a “square root” of the polynomial as described below. 

We consider a polynomial Pnj,oIv [^] with an even degree N^oiy and rewrite it as a product 
of monomials 

^poly ^poly 

PN^olyi^] = 5] Q(z - 1)* = n (^ - (5) 

i=0 k=l 

which approaches 1/z as Npoiy increases. At this point the convergence radius is assumed 

to cover all eigenvalues of the Wilson-Dirac operator, which will be conhrmed in Sec. |V| 
numerically. Since Zk appears with its complex conjugate, we may rewrite Eq. (j^) as 

^poly /‘^ 

PN,oly[A = CN,,ly n (^ - (6) 

J = 1 
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where k{j) and k'{j) are the arbitrary reordering indices dehned to satisfy the relation 
^k{j) ~ with j = 1.. .Npoiy/2. Using the property = 75 -D 275 one can show that 
det[£>2 - 4(i)] = det[Zi )2 — ^k{j)]^ and 


det[P/v,„iJ^2]] 


^poly/‘^ 

^^poly n det[U>2 - Zk{j)]^ det[U>2 - ^fc(i)] 

i=i 


( 7 ) 


where — Zk(j))- Then we obtain a psendofermion representation 

for an odd nnmber of flavors 


(det[T)2])^^2 = 


' det[T>2Pjv,„,jT>2]] 

det[P/v,„,jT>2]] 


n N 


f2 


1 N 


det[T>27^iVwJT>2]] /T>0^T>02exp 




( 8 ) 


N. 


f2 


As in the case of an even nnmber of flavors, the correction factor det[Zi) 2 -P/Vpoi,, [-D 2 ]] 
has to be kept to constrnct an exact algorithm. We describe the calcnlation of the correction 
factor for the Nf^ = 1 case in Sec. |VB 


We note that in this constrnction, the positivity of det[Zi) 2 ] is assnmed. Since the Wilson- 
type lattice fermions does not have chiral symmetry, the Wilson-Dirac operator D 2 may 
develop a real and negative eigenvalne, which conld make det[Zi) 2 ] negative. In actnal sim- 
nlations, we do not expect that this happens for the following reason. Under a continnons 
change of gange conhgnration, as in the molecnlar dynamics evolntion, the eigenvalnes also 
change continnously. To change the sign of a real eigenvalne it has to cross zero, for which 
the determinant det[D 2 ] vanishes which is snppressed. In addition, since the single flavor 
part is to be identihed with strange qnark in realistic applications, we expect that the in¬ 
termediate mass of strange qnark behaves as an infrared cntoff obstrncting the appearance 
of negative eigenvalnes. 


In onr implementation, we nse the fact that the correction factor det [Zi) 2 -P/v [-D 2 ]] 


N. 


f2 


is close to nnity. If this does not hold, the calcnlation will fail to converge. We shonld, 
therefore, be aware of the appearance of a negative determinant. Onr algorithm fails if this 
happens, bnt a negative determinant shonld be considered as a problem of the formnlation 
of the lattice fermion rather than the problem of the algorithm, since it is related to the lack 
of chiral symmetry. 
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C. Hybrid Monte Carlo algorithm 


Once we write an effective action for the fermion determinant using pseudofermions as in 
Eqs. (^, (H), and (||), it is straightforward to apply the hybrid Monte Carlo algorithm |0] to 
obtain an ensemble of gauge configurations including the effect of the approximated fermion 
determinant. 

Introducing a hctitious momentum P conjugate to the link variable U (we suppress the 
site, direction, and color indices), the partition function Eq. ( 0 ) is written as 

Z = JvUVPV(j){V(j)iV(j)lV(j)2det[W]e-^. (9) 


If we use the usual form Eq. (^) for an even number of flavors, and the polynomial represen¬ 
tation Eq. (H) for the rest of the fermions, the effective Hamiltonian H and the correction 
factor det[IE] take the form 


H 

det[iy] 


ip^ + s,[(7| + |r>, 

r n Nf 



[B2 



( 10 ) 


The HMC algorithm consists of the following four steps, for a given gauge configuration U. 


(1) Generate momenta P and pseudo-fermion fields and <p 2 from a Gaussian distribution 

with unit variance and zero mean. 


(2) Integrate link variables U according to the discretized molecular dynamics evolution 
equation derived from the equation of motion 

Ut,{n) = iP^,{n)U^{n), 

P^l{n) = -f[f/^(n)F^(n)]T,A., (11) 

where X is the derivative of a field X with respect to the fictitious time t and [• ■ -It.a. 
means the traceless anti-Hermitian part of the matrix in the bracket. The force F^{n) is 
defined through a variation of the effective Hamiltonian under an infinitesimal change 
5U^{n) of the gauge link variable 

6H = Y.i:i [{6U^{n)F^{n)] + H.c.]. (12) 

The length in the fictitious time t is arbitrary, which we set equal to unity throughout 
this paper. 






(3) Make a Metropolis test with respect to the energy difference dH between the initial 
configuration 17(0) and the trial configuration U{t). The acceptance probability is 
-Pacc[(17(0), T’(O)) —T’(t))] = niin[l, If the test is accepted go to the next 

step (4), or else the new configuration is set to (17(0), T’(O)) and go back to step (1). 


(4) Make a Metropolis test with respect to the correction factor det[iy]. If the test is 
accepted (17 (t), P(t)) is taken as the new configuration, or else the new configuration is 
(17(0), /’(O)). Then return to step (1). The details to obtain the acceptance probability 


is described in Sec. VB 


III. EVEN-ODD PRECONDITIONING FOR THE 0(a)-IMPR0VED WILSON 
FERMION ACTION 


Before going to the PHMC algorithm we discuss the even-odd preconditioning of the 
fermion determinant. The even-odd preconditioning is a widely used technique to accelerate 
the fermion matrix inversion 0, but it can also be used to reformulate the fermion deter¬ 
minant so that the pseudofermion field lives only on odd sites [18, O. For the unimproved 


Wilson fermion action, no extra computational cost is required by the reformulation, while 
the HMC simulation becomes faster, since the phase space to be covered is reduced by a 
factor of two. Luo 0 and Jansen and Liu [T^ introduced the even-odd preconditioning 
for the 0(a)-improved Wilson fermion which includes the clover-leaf-type operator. In this 
section we review their formulation and describe our extensive numerical test to see how it 
improves the efficiency of the HMC algorithm. 


A. Description of the preconditioning 


The determinant of the (9(a)-improved Wilson fermion operator D is written as 

/ 


det[i7] = det 


1 + Tee Tfeo 


(13) 


^ Mof, 1 + Too j 

when the site index n is numbered such that even sites come earlier than any odd site. Here, 
the site is even (odd), if Ux + Uy + Uz + rit is an even (odd) number. The hopping term M 
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(Meo or Moe) represents the usual Wilson fermion matrix 


Mn,n' = ^ {(1 - llx)Ufj,{n)5n+[,y + (1 + liJiWlin - , ( 14 ) 

M=1 


while T (Tee or Too) describes the 0(a)-improvement term (or SW term) 

Tn^n' ~^Cs^K(Jav{j^)^n,n' 1 

with the clover-leaf-type held strength given by 

^ [[Uf,{n)Uo{n + (i)Ul{n + 0)Ul{n) 

+Uy{n)Ul{n + P - ii)Ul{n - jT)U^{n - p) 

+Ul{n - ij)Ul{n - p - i>)U^{n - fi - i>)Uo{n - P) 
+Ul{n - P)U^{n - P)Uu{n - P + /i)f/^(n)| - H.c. 


(15) 




(16) 


where H.c. denotes the Hermitian conjugate of the preceding bracket. The Dirac matrix 7 ^ 
is dehned such that it is Hermitian, and = (i/ 2 )[ 7 ^, 7 ^,]. 

Factoring out the even-even component (1 + Tee) from the determinant Eq. ([T^) , we have 


det[D] = det[l -F T^e] det[i);fj, 


where 


bi = (1 + T)oo - Moe{l + T)-jMoo. 

It is also possible to factor out both the even-even and odd-odd components as 

det[£>] ^ det[l + T^e] det[l + Too] det[T>fo], 


( 17 ) 


(18) 


(19) 


where 

£>fo = 1 - (1 + (20) 

In the following, we refer to Eqs. (pTl) and (|T^ as asymmetric and symmetric preconditioning, 
respectively. To our knowledge, previous simulations in the literature have exclusively been 
made with the asymmetric even-odd preconditioning. 


Using Eqs. (lU) and (1^), the asymmetrically preconditioned partition function for two 
havor QCD can be written as 




A-HMC 


o^'To'- 


a-hmcTT, 4>o] 
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ffA-HMcl-P.C/.^ol = \P^ + S,\U] + S^[U,4:] + SUU], 
StPAo] = 

SL[U\ = -21ogdet|l + T„|. 


( 21 ) 


The pseudofermion field (j)o lives on odd sites, whereas the determinant det [1 + Tee] of the 
local SW term is calculated on even sites. 

For the symmetrically preconditioned partition function, from Eqs. (0) and (PPP we have 



H^umcIP.UA.] = 5^’" + S3|£/l+Sf[£/,<#'J + Si,[C/], 

sj'KW = |(r'o®)'Tof. 

SiilCl] = -2(logdet[l + r„] + logdet[l + rj). 


( 22 ) 


In this case the determinant of the local SW term is calculated both on even and odd sites. 

The calculation of the force dehned in Eq. ([I^ can be divided into several parts corre¬ 
sponding to the contribution from the pure gauge action, the pseudofermion part, and the 
determinant of the local SW term. We write down the contribution from the quark part in 
the Appendix for both preconditioning methods. 

B. Efficiency of the even-odd preconditioning 

The even-odd reformulation of the fermion determinant reduces the phase space to be 
covered by the HMC simulation. Another important effect of the preconditioning is that it 
lifts the lowest eigenvalue of the fermion matrix and thus the condition number is reduced. 
The strength of the force coming from the pseudofermionic part Sq[U,(j)] of the effective 
Hamiltonian becomes smaller [Q, since it is proportional to the inverse of the lowest eigen¬ 
value of the Dirac matrix. Therefore, the error dH accumulating in the molecular dynamics 
evolution is also expected to become smaller, resulting in a better acceptance rate in the 
HMC algorithm. To what extent the condition number is reduced depends on the particu¬ 
lars of preconditioning. We expect the symmetric one to work better, since in the hopping 
parameter expansion behaves as 1 — while contains a term proportional to k 

coming from Too- 
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In the following we describe a systematic test of the effect of the preconditioning of both 
types. The test is performed on three lattices: (i) a small lattice of size 8^ x 16 with a heavy 
quark mass, which we call the “small heavy” lattice, (ii) a large lattice of size 16^ x 48 with 
a heavy quark mass called “large heavy,” and (hi) a large lattice of size 16^ x 48 with a 
light quark mass called “large light.” Here, heavy and light quarks roughly correspond to 
mps/mv=0.8 and 0.7, respectively. The lattices (ii) and (iii) are reasonably large to study 
the light hadron spectrum. They are actually used in our production run Details of 
the lattice parameters are listed in Table |. 


Extensive test on a small lattice 


On the small heavy lattice, we investigate the molecular dynamics (MD) step size dt 
dependence of the acceptance rate Pace for each algorithm; “HMC” denotes the HMC algo¬ 
rithm without the preconditioning, “A-HMC” and “S-HMC” are used for the asymmetrically 
or symmetrically preconditioned HMC algorithm. 

We employ the BiCGStab algorithm to calculate the inverse of the Dirac matrix 
D (or Dfo). The symmetrical even-odd preconditioning is applied in the solver to 
accelerate the convergence of inversion. The stopping condition is dehned so that the solver 
iterates until the residual dehned by r = \J\Dx — 6p/|6p becomes smaller than a certain 
value, where 6 is a source vector and x is the solution vector. On the small heavy lattice, 
we use a rather strict stopping condition to avoid systematic errors coming from the matrix 
inversion. All numerical calculations are made with the double precision (64 bit) arithmetic. 
In Table ||, we show the number of the molecular dynamics (MD) steps Nmd {dt = 1/Amd) 
and the stopping condition for the BiCGStab solver in force and Hamiltonian calculations. 

For the MD evolution of the kinematical variables U and P, the simplest integration 
scheme to satisfy the reversibility and measure preservation is the leapfrog algorithm. In 
this work we hrst consider two options of the leapfrog algorithm, i.e., UPU and PUP 
integrators. In the UPU integrator, the link variable U is updated at the hrst half step 
and then the integration of P with a unit step size dt follows. Thus the link variable U is 
assigned at {n + 1/2) ■ dt with an integer n, while P is assigned at n ■ dt. The integration is 
performed in the reverse order in the PUP integrator. 

The acceptance rate in the HMC algorithm is governed by a change of the effective 
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Hamiltonian during the MD evolution {dH) as Pace = erfc(((iif)^/^/2). With the leapfrog 
integrator the change of effective Hamiltonian behaves as {dH) ~ dP for small dt ^ ^ . 

In Fig. D we show the MD step size dt dependence of {dH) for both UPU and PUP 
integrators. The dotted lines represent a £t with a form {dH) = tt {a ■ dt^. The Metropolis 
acceptance rate is plotted in Fig. ^ as a function of dt. The expected behavior eTfc[y/n{a ■ 
dty/2] is also shown by dotted curves. We observe that the data is described by the 
expected functional form. We also hnd that the UPU integrator gives better acceptance 
at a hxed dt than the PUP integrator, which has been known for a long time for the 
staggered fermion action |^. The computational cost with the UPU integrator is lower by 
a factor A^md/(^md + 1) than the PUP integrator since the computer time in dynamical 
QCD simulations is dominated by the force calculation that involves the fermion matrix 
inversion. Therefore the advantage of the UPU integrator is very clear. We then use the 
UPU integrator in the rest of this work. 

Let us now discuss the effect of preconditioning. Figures ^ and ^ show the MD step 
size dependence of {dH) and Pace for the HMC, A-HMC and S-HMC algorithms. The dt 
dependence for each algorithm is described very well by the relation {dH) oc dP as shown 
in Fig. and the value of {dH) for A-HMC (S-HMC) at a fixed dt is about a factor 5 (13) 
smaller than the unpreconditioned HMC. As a result, the acceptance is greatly improved as 
shown in Fig. For instance, at dt=0.02 Pace is 81% (88%) for A-HMC (S-HMC) compared 
to 60% for the unpreconditioned case. 

The efficiency of the algorithm may be defined as Pace ■ dt following Ref. In order to 
plot the efficiency Pace • dt as a function of Pace, we make use of an approximation of Pace'- 

1 


p = 

n.r r 


= exp (a ■ dt)^ — - {a ■ dt)'^ . 


(23) 


This approximation is valid for small dt [up to 0{dP)] and {dH) = 7r(a ■ dt)'^. The validity 
can be ascertained in Fig. H, where the approximation Eq. (1^) is plotted (dotted curve) as 
well as the exact one erfc[-y/F(a ■ dt)^/2] (dashed curve). Solving Eq. (|^) for dt, we obtain 
the explicit functional form for the efficiency Pace • dt as 

Pa 


Pace ' dt 


1 -2\og{Pacc) - 1 , 


(24) 


where the only parameter is a defined through {dH) = n{a ■ dt)'^. In Fig. ^ we plot the 
efficiency Pace ■ dt as a function of Pace, and Eq. (^4]) is plotted as a dotted line. It is 
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remarkable that the optimal efficiency is reached when Pace — 0.65 irrespective of details of 
the algorithm as far as we use the simplest leapfrog integrator for the MD evolution |^. ^ 
The efficiency of the algorithm can be measured by the parameter a. We therefore conclude 
that the efficiency of the A-HMC is a factor 1.5 better than the unpreconditioned HMC on 
the “small heavy” lattice, and that of S-HMC by a factor of 1.9 which is even better. 


D. Reversibility 

Before we extend the comparison of the preconditioning to the large (16^ x 48) lattice, 
we describe our choice of the stopping condition for the Wilson-Dirac operator inverter on 
the large lattice, since it is computationally not realistic to keep the very strict conditions of 
Sec. [Ill C| for the large lattice size. The stopping condition in the calculation of the force may 
be relaxed as far as the reversibility condition is maintained, which is tested in the following. 
In this section we employ the “S-HMC” preconditioning to investigate the reversibility. 

As a measure of how far one may loosen the stopping condition, we use the violation of 
the reversibility condition for the effective Hamiltonian defined by 

\AH\ = \H{tr-tr)-H{0)\, (25) 

where H{tr — tr) means the effective Hamiltonian calculated for the reversed conhguration 
which is obtained from the initial conhguration at t = 0 by integrating the equation of 
motion to t = tr and then integrating back to t = 0. The length of trajectory is 1^=1. For 
the S-HMC effective action, we measure \AH\ for several values of the stopping condition on 
20 thermalized conhgurations separated by 10 trajectories. Figures || and ^ show {\AH\/H) 
measured on the “large heavy” and “large light” lattices, respectively. While the violation 
stays around the limit of the double precision arithmetic for the heavy dynamical quark 
(Fig. 1^), it depends on the stopping condition for the light dynamical quark (Fig. |^). 

The behavior for the light quark mass can be understood as follows. If the initial vector 
in the BiCGStab solver is reversible {x = b is adopted in this work), the only source of the 
reversibility violation is the round-off errors in the numerical computation. Therefore, the 

^ In Ref. the maximum efficiency is reached at Pace — 0.61 rather than 0.65. This difference comes 
from the expansion Eq. (^) of the erfc function: the author of Ref. considered the lowest order only, 
while we include the second order. 
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error accumulates as the BiCGStab solver iterates and thus the violation increases as the 
stopping condition is tightened. This can be seen in Fig. from r = 10“® to 10“^. As we 
further decrease the stopping condition, the BiCGStab solver gives a solution vector with 
better accuracy, and the value of (|Ahf|/if) is governed by the accuracy of the solution 
vector. It decreases as we tighten the stopping condition from r = 10“^ to 10“^^. 

As criteria to choose the stopping condition, we demand that the solver iterates to the 
region where the {\AH\/H) is governed by the accuracy of the solution vector and that 
the variation of the Hamiltonian over the trajectory dH is not distorted by the error of the 
solution vector. These criteria are satished for r < 10“^, and we choose 10“® in the following 
simulations in this work. 

For completeness, we also calculate the violation of reversibility in the link variables U 
and the conjugate momenta P 


lAC'l =11 \(U,)aAn)(tr-Q-{U,U{nmr-, 

|AP| = / E - U) - (/>„U(>1)(0)P, 


(26) 


where the sum runs over all sites n, color indices a, b, and vector index /i. The results for 


\AU\ and |AP| normalized by a/9 x 4 x N^oi with N^oi the total number of lattice site are 
also plotted in Figs. | and |^, where we observe the same pattern of the stopping condition 
dependence as that of {\AH\/H). 

Since the MD evolution is chaotic, the violation of reversibility due to the rounding error 
may grow exponentially |^, |^. The UKQCD Collaboration studied the reversibility for the 
same lattice action as ours (but with the asymmetric preconditioning) with similar lattice 
parameters. They conhrmed the exponential instability when the stopping condition is too 


loose |28]. The stopping condition we adopt r < 10 ^ is strict enough and no such problem 


emerges in our case. We also note that in Ref. most of the numerical calculation is made 
with the single precision (32 bits) arithmetic, while we use double precision throughout this 
work. 

For the stopping condition in the Hamiltonian calculation, we keep a strict condition 
r < 10“^^, since there is a large cancellation in the difference dH = H{tr) — H{0), and the 
accuracy of dH is essential for the Metropolis test to be correct. 
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E. Efficiency on large lattices 


We list the simulation parameters used for A-HMC and S-HMC algorithms in Tables pi 
(“large heavy”) and EY|( “large light”). HMC means without preconditioning. We observe 
that the number of MD steps is much reduced for the preconditioned HMC algorithm com¬ 
pared to the unpreconditioned one at the almost same acceptance rate. More precisely, 
using the relation {dH) = Ti{adtY, we can compare the best efficiency of the algorithm 
which depends only on a as in Eq. (P^ . The gain is 1.5 from HMC to A-HMC (1.9 from 
HMC to S-HMC) on the large heavy lattice. For the large light lattice it is 1.8 from HMC 
to A-HMC (2.2 from HMC to S-HMC). The effect of even-odd preconditioning becomes 
more significant for lighter quark masses (note that for infinite quark mass there is no 
choice for even-odd preconditioning and is no improvement). From our tests we conclude 
that the symmetrically even-odd preconditioning is again the best choice within the simple 
even-odd preconditioning. Further improvement of the HMC algorithm may be achieved by 
preconditioning the partition function with incomplete LU factorization type precondition¬ 
ing 1^, |T^ 1^, Hereafter we employ the symmetrically even-odd preconditioned form for 
the 0(a)-improved Wilson-Dirac operator and the QCD partition function to develop the 
PHMC algorithm. 


IV. PHMC ALGORITHM FOR TWO-FLAVOR QCD 


The use of the polynomial hybrid Monte Carlo (PHMC) algorithm is essential for the 
construction of an exact algorithm for odd number of flavors. Before going to the PHMC 
algorithm with odd number of quarks, we investigate and develop the PHMC algorithm 
with two-flavors of quarks. The PHMC algorithm in two-flavor QCD was first proposed by 
Frezzotti and Jansen ii- Some numerical tests of its performance were made P, ^ on 
small lattices and used for the determination of Csw 0 or the running coupling constant 


with the Schrodinger functional method. In this section we perform further tests to explore 
the most effective choice of the polynomial and its degree with the PHMC algorithm in 
two-flavor QCD. The numerical simulation and its comparison to the HMC algorithm are 
carried out with the same lattice parameters employed in Sec. B 
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A. Partition function 


The partition function of two-flavor QCD with the symmetrical preconditioning in the 
PHMC algorithm is given by 


^PHMC 

-f^PHMC [P^U, 0o] 

Sloly[U,4>o] 





Pn, 


poly 


[Di]A 


(27) 


and 5'fgjf/] is the same as in Eq. (^). The difference from Eq. (p2D is in the pseudofermion 
action Spgiy[U,(j)] and in the insertion of the correction factor (det[lToo])^- 

The polynomial PNpoiyi^] — Ci{z — 1)* approximates 1/z for a complex 2 ; placed in 

the convergence region as Npoiy —>• 00 , and the coefficients c* are chosen so as to make the 
correction factor (det[lToo])^ as close to unity as possible for small Npoiy. For this purpose, 
several polynomials have been investigated in the literature. They include Chebyshev [0, 
least-squares |^, and adopted (with or without the UV hltering) ^ polynomials. We 
consider the Chebyshev and adopted polynomials in this work. 

The Chebyshev polynomial with unit circle convergence domain in the complex plane is 
defined by Cj = (—1)*. This is the same as the Taylor expansion with respect to the hopping 
matrix. We call the PHMC algorithm with the Chebyshev polynomial as C-PHMC. In this 
case, the accuracy of the polynomial is characterized by \zPNp^iy[z] — 1| = (^ — 

The coefficients q for the adapted polynomial are determined so as to minimize 

/V, 2 

PoqPnpoIv [PoolVo — Vo with a Gaussian noise vector rjo with unit variance on a thermal- 
ized background gauge configuration [^. The coefficients thus obtained do neither show 
a strong dependence on the background gauge conhguration nor on the noise vector. We 
call this choice as A-PHMC. We note that the adapted polynomial with the UV filtering is 
simple and proven to be more efficient for the unimproved Wilson fermion in the multibo¬ 
son algorithm |^, ^|. For the 0(a)-improved Wilson fermion, on the other hand, the UV 
hltering requires an additional term in the effective Hamiltonian, and to our knowledge its 
efficiency has not been tested yet. We leave it as a future subject to study the efficiency of 
the UV hltering for the PHMC algorithm with the 0(a)-improved Wilson fermion action. 
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B. Force calculation 


The molecular dynamics step in the PHMC algorithm requires a calculation of the force 
5Sp^iy\U, 5U^{n). Frezzotti and Jansen [^, || proposed to use a product representation 
of the polynomial 


^poly 

PN,,ly[^] = £ Q(z - 1)* = CN, 


AT, 


poly 


poly 


2 = 0 


n 

k=l 


Zk)i 


(28) 


with Zk the roots of PNp„iy[A = 0. The computational cost can be reduced by holding the 
intermediate vectors obtained by multiplying monomials on the pseudofermion field. 

In the product representation with a naive ordering of monomials, however, there is a 
problem of numerical instability and accumulation of round-off errors due to the fact that 
the partial product nL=i(^ ~ ^k) fluctuates by several orders of magnitude for intermediate 
im- The problem becomes more severe for small 2 ; and for large order of the polynomial. 
Bunk et al. proposed some ordering schemes to minimize the problem |^. In this work, 
instead of the product representation, we consider the Clenshaw’s recursive relation 


Pn„,,[A = ':<> 


+ 

1 

1 

l + -(^-l) 


U- {z 1) 



L Co 

L Cl 


^Npoly 1 




(29) 


for which the numerical instability is suppressed, because we add from the term giving the 
smallest contribution to the terms with larger contributions. We assume that the polynomial 
is converging and the higher order terms are smaller; otherwise the algorithm does not work 
efficiently. 

Adopting this representation, we expect that the calculation of the pseudofermion action 
Spgiy in Eq. ( pT]) before and after the MD evolution become stable numerically. The force 
from 5S^p,iy[U, cfio] is given by 

^poly I- 


where 


isl,„ = {E 

i=i 


XPi) = 




-(1 + 

XPU) 


H.C., 


(30) 


yPU) = 


-(1 + 

yPU) 


(31) 

(32) 
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vPij) 

O 

yPii) 
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ZPii) = 


-(1 + 

(1 + T)-}M„{1 + 


= (1 + r) 


-1 

oo 


= Ci 


X 


X 


{p 


s 

oo 


i-1 


yi"(o) 


1 + 
1 + 
1 + 


Cj 

9+2 

9+1 

9+3 

9+2 


(of. -1 
(cf„ -1) 


(33) 

(34) 


X 


Cn , 

j ^'poly 


PNpoly-l 


(i)^ - 

oo 


(35) 


In our implementation of the simulation code, we first calculate from j = Np^iy to 0 

and store them on memory. We then calculate and the force from j = 1 to Np^iy using 

the stored We do not need to store The requirement for memory is therefore 

the same as in the product representation used in Refs. P, ||. 

A potential source of the round-off errors in the calculation of the force is the sum over 
j in Eq. (|30|) , because the sum runs from the highest order to the lowest order in k, the jth 
term being of order . The numerical problem in the calculation of the force may be 

checked by looking at the violation of reversibility. We expect that the reversibility violation 
is small compared to HMC because the MD evolution involves no iterative processes and is 
completely deterministic. Numerical stability of the summation representation of polynomial 


and the reversibility of the molecular dynamics will be discussed in Sec. [IV E 


The pseudofermion held (j)o is generated from the Gaussian noise vector rjo at the beginning 
of the molecular dynamics step through 




(36) 


Since Woo is a matrix close to the identity matrix, the inversion of Woo is easily performed 
by any iterative solver within a few iterations. We use the BiCGStab solver in our imple¬ 
mentation. 
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C. Noisy Metropolis test for the correction factor 


In order to construct an exact algorithm, we have to take account of the correction factor 
(det[fhoo])^- We use the noisy Metropolis test method of Kennedy and Kuti [^, which was 
previously applied to make the multiboson algorithm exact in Refs. p!6| . 

After a trial conhguration U' is accepted by the HMC Metropolis test, we make another 
Metropolis test for the correction factor. Generating a Gaussian vector Xo with unit variance 
and zero mean, the probability Pcorr\U —>■ U'] to accept the trial conhguration is given by 


Pcorr\U —>■ U'] = min 1, e 


^-dS 


where 


dS = 


(WJiU'])-^W^\U]x,' -\x. 


(37) 


(38) 


The inverse {Woo\U']) ^ is calculated using the BiGGStab solver as in the generation of the 
pseudofermion held (j)o- 


D. Numerical test of the efficiency 


The total acceptance rate Ptotai of the PHMG algorithm is a product of that of the HMG 
Metropolis test Pace and of the noisy Metropolis test Pcorr- In this section we present several 
numerical tests on how Pace and Pcorr depend on parameters of the algorithm, such as the 
order of polynomial, and the MD step size. The simulations are made on the small heavy 
lattice, whose parameters are summarized in Table |. 

For the choice of polynomial type, we test the (non-Hermitian) Ghebyshev polynomial 
(G-PHMG) and the adapted polynomial |^, ^ (A-PHMG). The order of the polynomial 
tested is listed in Table |^. 

We hrst study the Npoiy dependence of Pace- In Fig. || we show (dH) as a function of 
Npoiy on the small heavy lattice at a fixed dt = 1/32 {Npoiy = 32) for both G-PHMG and 
A-PHMG algorithms. We find that {dH) is almost independent of Npoiy and agrees with 
the same quantity for the S-HMG algorithm. This is expected if the effective action of 
the PHMG algorithm approximates the original action well, because the PHMG replaces 
by a polynomial PN^oiyi^ool and the two are equivalent if the polynomial is a good 
approximation of the inverse. The MD step size dependence of (dH) is plotted in Fig. ^for 
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the usual S-HMC and for the C-PHMC with Npoiy=26 [which we call C-PHMC(26)], where 
we hud good agreement among different algorithms. This means that the acceptance Pace 
in PHMC is almost the same as that in the usual HMC. 

In contrast, Pcorr is expected to be sensitive to Npoiy Since the acceptance rate Pcorr 
is directly related to the expectation value of dS as dehned in Eq. (|5^), we measure the 
dependence of {dS) on Npoiy. Figure shows the plot for C-PHMC and A-PHMC at a 


hxed dt = 1/32 {Npoiy = 32). The dotted lines represent a £t with an exponential form 


{dS) = Tia^ ex]){—2bNpoiy). 


(39) 


The exponential form is expected because the error of the polynomial approximation behaves 
as \zPNp^iy[z] — 1| = {z — in the Chebyshev polynomial case. We hnd that the 

data are well described by the exponential form and that {dS) is much smaller for the 
adapted polynomial A-PHMC than that for the Chebyshev polynomial C-PHMC, which 
demonstrates the efficiency of the adapted polynomial. 

The acceptance rate in the noisy Metropolis step Pcorr is related to {dS) as Pcorr = 
erfc(((iS')^/^/2). We then obtain a plot of Pcorr as a function of Npoiy in Fig- O- The dotted 
curves represent 

Pcorr = erfc exp{-bNpoiy)\ , (40) 


with a and b the parameters in Eq. (|3^) . We clearly see that A-PHMC requires a smaller 
polynomial order Npoiy than C-PHMC to achieve the same acceptance rate. For instance, to 
obtain Pcorr — 0.8 we need Npoiy = 24 for C-PHMC while A-PHMC requires only Npoiy = 18. 

The efficiency of the PHMC algorithm for the noisy Metropolis test step can be quantihed 
by Pcorr/Npoiy, because the number of arithmetic operations is roughly proportional to Npoiy. 
In Fig. B we plot Pcorr/Npoiy against Pcorr, and find that the A-PHMC is about 30% more 
efficient than C-PHMC. We also hnd that the efficiency peaks around Pcorr=0.85. From 
Eq. we obtain 


Pr. 


Pr. 


(41) 


Npoiy b _ log {l{^l-2\og{Pcorr) " 1)} 
using an expansion erfc(x) = exp { —(2/y^)[x + {x^/^/^i)]} + 0{x^). The efficiency is pro¬ 
portional to 1/6, which controls the exponential fall off of {dS) as in Eq. (^), and the 
position of the maximum efficiency depends on a. When a ~ 0(10), as we observe in these 
tests, the maximum appears around PcorT=0.85 and it moves to larger values of Pcorr as a 
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becomes larger. Since a is expected to scale as |T^, the maximum efficiency is 

obtained for Pcorr > 0.85 when the lattice volume becomes larger or when the sea quark 
becomes lighter. 

E. PHMC on large lattices 


The PHMC algorithm works well with a reasonable order of the polynomial on the small 
heavy lattice. It is not trivial, however, whether it really works on larger lattices, because 
we expect that a polynomial with much larger order is needed. 

For a numerical test on the “large heavy” and “large light” lattices (Table |) we consider 
the Chebyshev polynomial (C-PHMC) only, since we were not able to obtain an optimized 


D^Pn,,JD^o]Vo 


Vc 


polynomial for the A-PHMC. The reason is that the minimization of 
with respect to the coefficients of polynomial failed to converge for polynomials of a large 
(~ 100) order which are needed for these large lattices. This is likely a problem of the 
steepest descent algorithm used in the minimization, and not a fundamental difficulty of the 
adapted polynomial. We leave a resolution of this problem to future studies. 

We hrst consider the question of how the polynomial approximation of works for 

reasonably large lattices. To investigate this we dehne a residual 

\PN,„iy[Doo]DooVo - Vo\ 


\Vo 


(42) 


with a Gaussian noise vector rjo. We expect that the residual becomes exponentially smaller 
as Npoiy increases, if the polynomial provides a good approximation of the inverse Dirac 
matrix. We measure this quantity on 20 thermalized conhgurations of large heavy and large 
light lattices and plot it as a function of Npoiy in Fig. |^. For both heavy and light dynamical 
quarks, we hnd a clear exponential decrease, while the slope signihcantly depends on the 
sea quark mass. We also note that the polynomial approximation is not distorted by the 
round-off error even for Npoiy ~ 100-200. 

When the order of polynomial is large, another important test is the check of the re¬ 
versibility in the MD steps. As we mentioned in Sec. P!VB| , our implementation of the force 
calculation may cause round-off errors. As in Sec. [IIID| we investigate the violation of re¬ 
versibility in {\AH\/H), (|Af/|), and (|AP|) by measuring these quantities on the same 
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20 configurations. The results are plotted in Figs. |TJ and for large heavy and large 
light, as a function of Npoiy. We observe no dependence on Npoiy for both lattices and the 
violation of reversibility remains close to the limit of the double precision arithmetic. This 
implies that the Clenshaw-type representation of the polynomial Eq. adopted in our 
implementation of the PHMC algorithm does not accumulate round-off errors even for large 
Npoiy We also emphasize that the violation is much smaller than in the usual HMC plot¬ 
ted in Figs. |] and ^ In the HMC algorithm the number of arithmetic operations can be 
different between forward and backward steps, because the convergence of the BiCGStab 
solver is controlled by the condition that the residual is smaller than a certain value. We 
suspect that the reversibility becomes better if the number of iterations (thus the number of 
arithmetic operations) is hxed in the solver. Even if this is the case, the numerical stability 
is not optimized in the BiCGStab solver, and the PHMC is still expected to perform better 
regarding the reversibility. 

We then measure the actual efficiency on large lattices. The simulation parameters and 
some results are summarized in Tables 0 and 0 for heavy and light dynamical quarks. We 
plot {dS) and Pcorr = erfc(((iS')^/^/2) as functions of Npoiy in Figs. and [I3- Compared 
to the small lattice, substantially larger Npoiy are needed to keep the acceptance rate at 
reasonably large values. 

Furthermore, {dS) and the acceptance depends substantially on the sea quark mass. As 
discussed in Ref. the parameter h, which parametrizes the slope of {dS), is expected to 
be proportional to the quark mass. This expectation is confirmed in our simulations: the 
ratio of the quark masses in the two simulations is 2.04(6), while the ratio of b is 2.15(15). 

The efficiency of the noisy Metropolis step Pcorr/Npoiy is plotted in Fig. The maximum 
efficiency is achieved around Pcorr = 0.9, and the height at the maximum is lower for the 
lighter quark mass than that for the heavier one by about a factor of two, as we expected 
from the ratio of b [and from Eq. (|4l|) ]. 

Finally, we compare the total efficiency of the PHMC algorithm with that of the usual 
HMC. The efficiency is parametrized as Ptota;/[A^Muit/traj], which is plotted in Fig. |T^ against 
the total acceptance ratio Ptotai- The total acceptance ratio Ptotai of the PHMC algorithm 
is dehned by Ptotai = PaccPcorr] for the HMC algorithm it is Ptotai = Pace- The number 
of hopping matrix multiplications to cover a unit trajectory, [AMuit/fraj], is counted in the 
program. The efficiency of PHMC is slightly better than the usual HMC for both heavy and 
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light dynamical quarks. We note that the efficiency of HMC depends substantially on the 
stopping condition imposed. As we discussed in Sec. Eng we carefully chose the stopping 
condition for HMC, but the remaining violation of the reversibility is still large compared to 
the PHMC. Therefore, in order to guarantee the exactness of the algorithm strictly, a strict 
stopping condition is required and then the efficiency of HMC becomes much lower. 


V. PHMC ALGORITHM FOR AN ODD NUMBER OF FLAVORS 

In this section we describe an extension of the PHMC algorithm to the case of odd number 
of flavors. As we already outlined in Sec. H, the algorithm is almost the same as that for 
even number of flavors, except for the polynomial in the evaluation of the correction factor. 
We make a numerical check of the algorithm by comparing the simulation of 1 + 1-flavor 
QCD with the two-flavor case simulated with the HMC and PHMC algorithms. In addition 
we carry out a simulation of 2 -|- 1-flavor QCD and compare the results with that obtained 
by the R algorithm. 

A. PHMC for one-flavor QCD 

In order to construct a real and positive dehnite effective action for one-flavor of dynamical 
quark, we use the trick proposed by Borigi and de Forcrand [0 and Alexandrou et al. [||, 
which was already described in Sec. EH 

A polynomial of even degree PNpoiy[z\ can be split into the product of two polynomials 
TNy,iy[A and as 

PNy„ly[A = TNy„ly[AT Ny,ly[z\, (dd) 

Npoly/‘i 

Tn„M = E (44) 

i=0 

Npoly/‘i 

Tn^M = E dt(z-l)'. (46) 

j=0 

Note that here we use the summation representation for [T instead of the 

product representation as in Eq. (J^). The coefficients di in T^^^^A are determined as follows. 
First, we consider the product representation of PNy,iy[z] as PNy,iy[z] = 

The ordering of the monomials is dehned so that SiTg{zk — 1) increases monotonically 
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with increasing k. Since the roots Zk appear with their complex conjugate, we hnd 
Zk = {k = 1... Npoiy/2). We then split the polynomial into the product of 

two polynomial as - Zk(j)){z - where the reordering in¬ 

dex k{j) is dehned by k{j) = 2j — 1. Then we obtain a “square root” of the polynomial as 
'^NpoiyiA — y/^Npoiy Y^=i^'^{z — Zk(j)), from which we arrive at the polynomial representation 
Eq. by expanding the product representation. Since we do not use the product repre¬ 
sentation of T]\f^^i^[z] in the numerical simulation, the problem of the ordering of monomials 
is irrelevant as long as one uses long enough decimal precision or computer algebra systems 
to obtain the coefficients di. 

We note that ^ TNpaiy[z\ complex z, but for the determinant of the Wilson- 

Dirac operator D one can prove the relation 




using the 75 Hermiticity property = 75 ^ 75 . It follows that 


(46) 


det[P;v,.JI^]] = det[T;v,.jP]] ■ det[T;v,„jP]] = | det[T;v,.jP]]T- (47) 

For the preconditioned case, the Hermiticity is modihed to = 75 ( 1 -|-T)oo£)£,( 1 -|-T)“q 75 , 
for which Eq. (|46|) holds as well. 

The partition function for one-flavor QCD can be written as 




PpHMC [P-iU, 0o] 


S%iy[(i)o] 

sL[u] 


JvUVPV(j)lV(j)o det [Woo]e-^™Mc[TC/,<^o] ^ 

ip^ + s,l(/| + 7„„l« + si,|i7], 

- (log det [1 -f Tee] + log det [1 Too]) ■ 


(48) 


The polynomial PNj,oiy[Doo] two-flavor case Eq. (^Tj) is replaced by . The 

correction factor det [Woo] is the same as that dehned in Eq. (|^) , but the exponent is 1. 

Every step of the HMC part of the simulation is the same as the corresponding step 
in the two-havor case, except that we use the polynomial rather than PNpoiy- The 

pseudofermion held is similarly generated by 


4>o = TNp,iy[Di] % = TNp,ji)^^]i)^^wj7], 


(49) 
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with a Gaussian noise vector rjo at the beginning of each MD step. On the other hand, 
the noisy Metropolis step to incorporate the correction factor requires a special treatment, 
because the correction factor is not det[iyoo]^ but det[iyoo]- 

B. Noisy Metropolis test for the one-flavor case 


If the fermion determinant det[£)fo] is positive, det[hFoo] is also positive and its square 
root is well dehned. We calculate the square root of the matrix Woo by solving the equation 
Al = W„ using the Taylor expansion 

Aoo = 1 + X! ^ = 1 + 2^00 - + iQ^oo ■ ■ ■ (50) 

with 6oo = Woo — 1 , because we expect that Woo is close to the identity matrix when the 
polynomial PNp„iy[Doo] is a good approximation of We obtain 

det[Woo] = |det[Aoo]|^ , (51) 


using the (preconditioned) 75 Hermiticity property = 75(1 + T)ooAoo(l + T)^^ 75. 

Once we obtain the matrix Aoo, we can perform the noisy Metropolis test Eq. (^7l) re¬ 
placing Woo in Eq. (|3|) by Aoo, 


dS 


(A„lU']y' A^\U]xo 



( 52 ) 


The only complication is the use of the Taylor expansion Eq. (^0|) every time we need a 
multiplication with Aoo- For the inverse A~^ we use another polynomial 


Ay' = 1 + = 1 - + gC - ^■’1 ■ ■ ■ ■ (53) 

In the numerical calculation, summation from the lower order to the higher should be avoided 
to reduce round-off errors. We therefore use the following (Clenshaw’s type) expressions: 


Aoo — 


A'-^ = 

^ 00 


1 + 2 ^°° 

1 + Ak, 


1 H—^^00 
1 -1- 

1 + -^^00 


1 H —^^00 ■ 
D 

1 + 


1 + 


3-2k 


1 + 


2k " 
l-2k 


2k 


SL 


( 54 ) 

( 55 ) 


A shortcoming of this method is that we have to recalculate the entire expressions when we 
need to increase the order of truncation k in the Taylor expansion. 
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In order to avoid systematic errors from the truncation of the Taylor expansion, we 
monitor the residual 

\Aoo[U]{A,,[U]xo) -W,JyU]xo\ 


ri = 


\Woo[U]Xo 


in the calculation of Aoo\U]xo-i and 

\Woo[U']{Aoo[U'])-^ {{Aoo[U'])-^UJo} - UJo 


r2 = 


U!n 


(67) 


in the calculation of {Aoo\U'])~^uJo with ujo = Aoo\U]xo- We require that the residuals be 
smaller than 10“^“^ to keep the exactness of the algorithm. In the simulation program we 
always monitor the residuals, and when the residuals become larger than our condition we 
repeat the calculation increasing k until it becomes satished. 

The necessary order of the Taylor expansion depends signihcantly on the order of polyno¬ 
mial Npoiy. If Npoly is large enough, IToo is very close to the identity and the Taylor expansion 
may be truncated at very low orders. Therefore, there is a complicated trade-off between 
Npoly and k to the computational cost in the algorithm. We consider briefly the computa¬ 
tional cost to calculate the square root of the correction matrix and the noisy Metropolis 
acceptance probability as follows. In the case of the Chebyshev polynomial, the residual of 
the correction matrix is estimated as 


- 1 = = {Di - + \ 


(58) 


If we take A as the largest eigenvalue of — 1, this leads to |^oo| — where |A| < 1 

is assumed. To keep the residual of the square root Eq. (^) (for example) lower than a 
constant e, we have the following inequality when the Taylor expansion is truncated at an 
order k: 


ri cx 



|^|(fc+l)(A7poij,-|-l) ^ 


(59) 


with a coefficient C. Thus {k -|- l){Npoiy + 1) must be larger than a constant proportional 
to ln(e). When we £x e as a stopping condition, the truncation order k is chosen so as to 
satisfy Eq. (|5^) . The computational cost to calculate the square root of the correction matrix 
becomes a constant because the number of multiplication of is proportional to fc x Npoiy, 
which is roughly ~ (fc -|- l){Npoiy + 1). Consequently the total amount of the computational 
cost to calculate Eq. (|S^) becomes almost constant. Thus we conclude that the choice of 
Npoly does not affect the cost in the noisy Metropolis test, and that the efficiency of the 
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whole algorithm is governed by the cost of the molecular dynamics step (proportional to 
Npoiy) and by the acceptance rates of the HMC and the noisy Metropolis tests. 

In order to evaluate the correction factor det [hhoo], Takaishi and de Forcrand 0 employed 
the idea of the unbiased stochastic estimator using y^det [hFoo[17']2/lFoo[f^]^] = 
from several estimates of with dS dehned in Eq. (|38D for the Nf = 2. Their method 

is faster than ours because they do not need to calculate the square root of the correction 
matrix as we did in Eqs. (|50D and (^). On the other hand, the stochastic estimator may 
produce negative probabilities for the Metropolis test, which leads to systematic errors in 
the hnal results. In order to avoid this problem they keep dS sufficiently small with a high 
acceptance ratio so that the negative probabilities within a desired trajectory length do not 
appear. In our method these problems are avoided at the price of additional computational 
costs by taking explicitly the square root of the correction matrix. 


C. Numerical test with (1 + l)-flavor QCD 


The algorithm for one-flavor of dynamical fermion can be tested by considering (1 -|- 1)- 
flavor QCD, which should be identical to two-flavor QCD. Since we already have results with 
established algorithms for two-flavor QCD, we check if we can reproduce the results with the 
(1 -|- l)-flavor QCD simulation. For (1 -|- 1) flavors, we introduce two sets of pseudofermion 
helds (/=!, 2) with the effective action = \TNp^iy[DooWm‘^- The correction 


factor det[hFoo] is evaluated twice with the noisy Metropolis test described in Sec. [VB . 


Simulation parameters and some results on our small heavy lattice are listed in Table [VTl . 
We employ the Chebyshev polynomial of order Npoiy = 26 both in the two-flavor simulation 
and in the (1 -|- l)-flavor simulation with PHMC algorithms. Note that the order of the 
polynomial Tvpoij, in iVj = 1 -|- 1 is 26/2 = 13 by its dehnition for each pseudofermion. We 
also have a result with the standard S-HMC. 

We observe in Table |V11| that the three algorithms give a consistent plaquette expectation 
value within the statistical error of less than 0.1%. It is evident that the algorithm for odd 
number of flavors works as we expected. The statistical error is evaluated with the binned 
jack-knife method and the bin size is increased until the error ceases to grow. 

In the same table we hnd that (dH), which controls the HMC acceptance Pace, is sig- 
nihcantly smaller for the (1 -|- l)-flavor simulation at the same MD step size dt. The size 







of {dH) depends on the precise form of the Hamiltonian we consider. While the formula 
described in Ref. may be employed to examine this issue, we do not pursue it here 
because of the complication of the force contribution from the pseudofermion action. Note 
that this decrease of {dH) in the Nf = 1 + 1 case does not immediately mean an increase 
of the efficiency. The reason is that we expect the duplication of the pseudofermion field to 

cause an extension of the autocorrelation time. 

_2 

We find that the acceptance rate Pmr^ in the correction factor for the two-flavor case 
is related to those of the (1 + l)-flavor simulation as — {Pcor^^Y- This property can 

be explained as follows: Expanding dS^f^‘^ in Eq. (|^) in terms of 5oo = lToo[f^] — 1 and 
5'^^ = lToo[f^1 — 1, we obtain — 5') ooX<^ np to 0((5^, 55'). On the other 

hand, dS^f^^ in Eq. (|5^) is expressed as dS^f^^ = Re[xf(5 — 5')ooXo\- Up to higher orders 


in 5oo and 5'^^ we then obtain ~ 2dS^f^^ and ~ (PclrP^)'^. 

We also test our algorithm on large heavy and large light lattices. The convergence of 
the polynomial and of the Taylor expansion of the correction factor is non-trivial 

on these large lattice sizes. To investigate the convergence of the polynomial we 

perform the same check as that made for PN^aiyi^ool- Fig. ^ we show the convergence 
behavior using 




Tn,,JDUTm,JDoo\DooVo - Vc 


\Vo 


(60) 


as the residual. Here rjo is a Gaussian noise vector and the measurement is made on 20 
thermalized configurations separated by ten trajectories. Since T should 
be PNj,„iy[Do(\ by definition, Eq. ( p]|) must be identical to Eq. (^2]) except for round-off 
errors. As shown in Fig. ^ Eq. ( 0 ) decreases exponentially as increases, which is 


the same behavior as in Fig. Thus we confirm that there is no unexpected accumulation 
of round-off errors in the calculation of with our choice of N^oiy [£)fj is 

also evaluated with the Clenshaw’s recurrence formula). The violation of reversibility is 
extremely small as plotted in Figs. ^ and Their magnitude stays around the limit of 
the double precision arithmetic, which parallels our finding with the two-flavor case (Figs. 0 
and |T^. 

Figures ^ (large heavy) and ^ (large light) show the convergence behavior of the Tay¬ 
lor expansion of the correction matrix as a function of the order of the expansion. The 
convergence is monitored with the residuals ri and r 2 defined in Eqs. (|56| ) and ( 0 ) , respec- 
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tively. We also monitor the convergence of the weight dS dehned in Eq. by measuring 
Ids' — dSendl, where dSend is the value of dS at the highest order of the expansion. These 
hgures are also plotted with measurements on 20 conhgurations separated by 10 trajectories. 
Open symbols are obtained for the smallest Npoiy (70 for large heavy, 100 for large light), and 
hlled ones are for the largest Np^iy (190 for large heavy, 200 for large light). The convergence 
of the residuals is almost exponential. The slope, however, becomes weaker near the limit 
of the double precision arithmetic. In the region where the exponential decay is observed, 
k X Npoiy seems to behave as roughly constant irrespective of the choice of Npoiy. This is the 
expected behavior discussed in Sec. |VB| . When the stopping condition for ri and r 2 is set 
to be 10“^"^, the improvement of \dS — dSond\ stops at ~ 10“^^. Since dS itself is of 0(10“^), 
we expect that dS has ~ 10 digits of signihcant hgure, which we expect to be sufficient for 
current simulation trajectory lengths. The negative eigenvalue problem did not occur in 
these investigations, probably because of the intermediate quark mass we employed. 

Table [VIII shows the simulation statistics for C-PHMC with Nf = 1 + 1 on both of 
the large lattices. We obtain results for the averaged plaquette value which are consistent 


with those for the Nf = 2 case. The relation Pcorr — {Pcofr )^ holds again for such 


large lattice sizes, and we did not encounter the negative eigenvalue problem during the 
long trajectories (~ 1000). We expect that the total efficiency has the same functional 
dependence on Npoiy as that with the Nf = 2 PHMC, since the behavior on Npoiy is mostly 
ruled by the molecular dynamics. The actual value of the total efficiency is slightly worse 
than that with the Nf = 2 PHMC algorithm due to the two pseudofermion generations, 
the Hamiltonian calculation, and monitoring of the residual in the noisy Metropolis test. 
We note that the autocorrelation time may be extended by the increase of the dynamical 
variable in the path integral. Examination of this point is left for future studies. With 
the numerical tests described here we conclude that our PHMC algorithm for one-flavor 
dynamical quark works well even for a moderately large lattice size 16^ x 48 at intermediate 
quark masses of mps/my ~ 0.7 — 0.8, at least in the Nf = 1 + 1 case. 


D. A (2 -I- l)-flavor QCD simulation 

Combining the two-flavor HMC algorithm with the one-flavor PHMC leads to an exact 
algorithm for (2 -|- l)-flavor QCD. For the two-flavor part, we may also choose the PHMC 
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if the usable amount of memory allows to store work vectors. A test of the algorithm can 
be performed comparing the results with those of the R algorithm after an extrapolation 
to zero step size in the latter. In this section we show the results of such a comparison on 
some small lattices. 

The numerical test is made with the following two sets of lattice parameters. One set 
uses a lattice of size 4^ x 8 at /3 = 4.8, sea quark mass of Kud = 0.150 for two light flavors, 
and Ks = 0.140 for the third flavor, and Csw = 1-0 for all three flavors {Nj = 2 + 1 small). 
The order of the polynomial is set to Npoiy = 10 for the single flavor. The second set uses a 
8 ^ X 16 lattice, [3 = 5.0, Kud = 0.1338, and Kg = 0.1330, and Csw = 2.08 {Nf = 2 + 1 middle), 
where Npoiy = 58 is employed. For both lattice sizes we use the Chebyshev polynomial with 
unit circle convergence domain. 

The simulation statistics is tabulated in Tables and ^ together with the plaquette 
expectation value extracted from the R algorithm. Figures ^ and ^ show the plaquette 
expectation value from the runs with the R algorithm at several values of the MD step size 
dt (open symbols). Filled symbols are from the PHMC algorithm. The plaquette values 
with the R algorithm extrapolated to zero step size are plotted with dotted horizontal lines. 
We observe that our exact algorithm (hlled symbols) gives results at a hnite dt (see also 
Tables and 0 consistent with the extrapolated value (horizontal dotted line) of the R 
algorithm. Because of the hnite dt dependence, the cost to obtain reliable results with the 
R algorithm is higher than that of the PHMC algorithm. 

For larger and realistic lattice sizes, we started a parameter search in order to realize a 
physical volume L ~ 1.7 — 2.0 fm, a lattice cutoff a~^ ~ 1.5 — 2.0 GeV, and pseudoscalar to 
vector meson mass ratios mps/my ~ 0.7 — 0.8. During the parameter search we found an 


unexpected hrst-order phase transition Details of this search, including the property 
of the PHMC algorithm with the realistic parameters in the Nf = 2 + 1 case on large lattice 
sizes, will be reported elsewhere. 


VI. CONCLUSIONS 

In this paper, we introduced a polynomial hybrid Monte Carlo (PHMC) algorithm which 
is applicable to QCD with an odd number of flavors. The algorithm is an extension of 
the one by Takaishi and de Forcrand ^ to the 0(a)-improved Wilson quark action. We 
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also described a method to remove the systematic error from the non-Hermitian polynomial 
approximation to the invese of the Wilson-Dirac operator in the single flavor case. 

An important technical point uncovered in our work concerns the choice of the even-odd 
preconditioning to the 0(a)-improved Wilson-Dirac operator. Asymmetric and symmetric 
even-odd preconditionings were introduced and investigated in the HMC algorithm with 
two-flavor dynamical quarks. We found that the HMC algorithm with the symmetrically 
even-odd preconditioned form of the lattice QCD partition function yields roughly a factor 
two gain in efficiency over the unpreconditioned one. This performance exceeds the gain 
of about 1.5 for the asymmetrical preconditioning employed in actual simulations so far. 
We, then, decided to use the symmetrically even-odd preconditioned form for the quark 
determinant for the PHMC algorithm. 

We explored distinctive features of the PHMC algorithm using the case of two flavors 
of quarks where comparisons with the standard HMC are possible. Our hndings are (i) 
the reversibility is much better with the PHMC algorithm because of the fully determinis¬ 
tic nature of multiplication with the Wilson-Dirac operator in the force calculation in the 
molecular dynamics step, (ii) for the order of the polynomial chosen sufficiently large, the 
total efficiency of the PHMC algorithm is almost identical to or rather better than that with 
the HMC algorithm. Hence the PHMC algorithm is an alternative for Nf = 2 dynamical 
QCD simulations on moderately large lattice size in the intermediate quark mass region 
mps/rny rsj Q _ 'Y— 0.8. 

We demonstrated the consistency and applicability of the PHMC algorithm for an odd 
number of flavors by considering the case of two single-flavor pseudofermions {Nf = 1 -|- 1 
QCD) and comparing it with the established algorithm for the two-flavor pseudofermion 
{Nf = 2 QCD). The reversibility holds to almost the same degree as that with the Nf = 2 
PHMC algorithm. The noisy Metropolis test for single-flavor part, in which we have to take 
the square root of the correction matrix explicitly, works well on moderately large lattices 
with intermediate quark masses of mps/my ~ 0.7 — 0.8. 

Finally we constructed a PHMC algorithm for 2-1-1 flavors of quarks by combining a 
two-flavored pseudofermion, which is employed in the usual HMC algorithm, and a single- 
flavored pseudofermion described by the polynomial approximation. Running the algorithm 
on two small lattice sizes we conhrmed an agreement of plaquette values with those from 
the R algorithm after an extrapolation to the zero step size in the latter. 
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We conclude that the PHMC algorithm is a viable choice for realistic simulations of lattice 
QCD with 2+1 flavors. Since our numerical tests show that the computational cost for two 
single-flavor pseudofermions is comparable to that of the two-flavor case, the cost for the 
single-flavor part of the (2 + l)-flavor QCD is about a half of the two-flavor part. We thus 
expect that the simulation of the (2 + l)-flavor QCD may be performed with a cost of a 
factor 1.5 — 2 compared to the two-flavor QCD simulation. 
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APPENDIX A: FORCE CALCULATION IN THE HMC ALGORITHMS 

In this appendix we describe the explicit form of the qnark force in the HMC algorithms 
for different preconditionings. Since most of the definitions and extractions of the quark 
force are common to the standard Wilson quark action, we only show the variation of the 
quark action under an infinitesimal change of the gauge link variable as defined in Eq. (0i- 


1. Without preconditioning 


If we do not apply the even-odd preconditioning, the force from the pseudofermion field 


is simply written as 


6Sg = {-X^6DY} + II.C., 

(Al) 

where 


X = 


(A 2 ) 


Y = I 

D-V, 

(A3) 


6D = 

^ 5Moe 5Too ) ' 

(A4) 


The contribution from the derivative of the hopping matrix 6Meo{oe) is the same as that in 
the Wilson action. The contribution from the SW term 6Tf.g(^oo) is shown in Fig. where 
X is a 3 X 3 matrix defined by 





"t^dirac 


cr^^F(n)X(n) 


+ H.C. 


(AS) 


trdirac[' ■ ■] means the trace over the spinor indices. 


2. Asymmetric preconditioning 


The force from the pseudofermion field with the asymmetric preconditioning is given by 


6S^ = [-X^^6DY^} + H.C., 


where 




' -(1 + \ 

V T'' j ’ 


(A 6 ) 


(A7) 
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f -(1 + \ 

V ]' 

Xo = 

Note that = 75^;^75 and ^ 75t'/e(,75- 

In addition we need the force from the determinant of the SW term, 


(A8) 

(A9) 

(AlO) 


6Si, = -2Tr 


5Tee(l + T) 


(All) 


This is calculated only for even sites. The term X in Fig. ^ from the SW term is replaced 
by 


( X ) (n) 




8 

%Ca^l^K 


trdirac cT^t.y^(n)X^(n)^ 


trdirac ^ (l + T)-\n) 


•^n,evensite 


+H.C. 


(A12) 


3. Symmetric preconditioning 


For the symmetric preconditioning the force is separated into two parts as 


where 


5Sl = [-X^^SMY^ - X^UTZ^} + H.C., 


SM = 


6T = 


X^ = 


= 


0 6M,o 
5Moe 0 

5Tee 0 
0 6T,. 


V 
/ 

V 


■(l+T)„>Af„tA'f 

Y 

-(1 + rrjhuY 

yS 


(A13) 

(AM) 

(A16) 

(A16) 

(A17) 
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V (1 + r)-‘M„(l + j ’ 

x! = (i + r)-'(i)f„y 
i'/ = (of.) "Vo. 


(A18) 

(A19) 

(A20) 


The 75 Hermiticity is slightly different for which is =75(i + r)o.ct(i + r)«S. 
The force contribution from the determinant of the SW term is written as 


SSi, = -2Tr 


i5r(i + r) 


-1 


(A21) 


at every lattice site. The term X in Fig. ^ is replaced by 


(x) (n) 

\ / flly 




-trdi, 




^Ccw ^ . 


-trdii 


cr/,i.(l + T) An) 


+ H.C. 
(A22) 


4. PHMC 


In the PHMC algorithm, the term X from the SW-term 6T in Fig. is written as 


(A 


tCa^Kj 


[n = 


fiv 


■ ^ (trdirac ) - ^^^Hdirac a^y{l + T) ^(n) 


i=i 


+H.C. 


(A23) 
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100, 50, 100, 50, 50, 40, 

40, 30, 40, 32, 32, 25, 32 

25, 20 25, 20 20 

18, 20, 

22, 24, 

28, 30, 

^poly 

(for A^md = 32) 

26, 

(for all A^md)- 

Stopping condition 

10 “^^ 10 “^^ 

(force) 

Stopping condition 

10-14 10-14 

(Hamiltonian) 

“This is used to generate a pseudofermion field and global Metropolis test for the correction factor. 


10-12 

10-14 10-14 “ 10-14 “ 


14, 16, 
18, 20, 
22 
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TABLE III: Simulation with the HMC algorithm on the large heavy lattice. 



HMC 

A-HMC 

S-HMC 

^MD 

160 

100 

80 

Stopping condition 


10 -i» a 

10 ““ 

io-« 

(force) 

Stopping condition 

—20 a 

10-14 

10-14 

(Hamiltonian) 

Trajectories 

3000 

1200 

1200 

{dH) 

0.144(15) 

0.182(17) 

0.187(28) 

HMC acceptance 

0.799(9) 

0.764(12) 

0.759(23) 

Plaquette 

0.52801(10) 

0.52803(9) 

0.52827(13) 

“The residual is defnined by \Ax — b\ in the HMC case. 

TABLE IV: Same as in Table HI but for the large light lattice. 



HMC 

A-HMC 

S-HMC 

A^md 

200 

125 

100 

Stopping condition 

10-18 a 

10 -® 

10 -® 

(force) 

Stopping condition 

20 a 

10-14 

10-14 

(Hamiltonian) 

Trajectories 

3000 

1200 

850 

{dH) 

0.313(23) 

0.182(17) 

0.218(22) 

HMC acceptance 

0.702(11) 

0.724(13) 

0.761(16) 

Plaquette 

0.53413(5) 

0.53404(9) 

0.53393(11) 


“The residual is defnined by \Ax — 5| in the HMC case. 


40 















TABLE V: Simulation with the C-PHMC algorithm on the large heavy lattice. 



C-PHMC(70) 

C-PHMC(80) 

C-PHMC(90) 

-^MD 

80 

80 

80 

Stopping condition “ 

10-14 

10-14 

10-14 

Trajectories 

1300 

1000 

1000 

{dH) 

0.151(21) 

0.187(22) 

0.154(31) 

BMC acceptance 

0.775(17) 

0.763(23) 

0.787(21) 

{dS) 

0.244(32) 

0.069(24) 

0.013(7) 

Correction acceptance 

0.731(20) 

0.851(21) 

0.930(13) 

Total acceptance 

0.568(17) 

0.658(27) 

0.732(30) 

Plaquette 

0.52803(11) 

0.52809(10) 

0.52809(10) 


“This is used for the generation of the pseudofermion field and the calculation of the correction factor. 


TABLE VI: Same as Table ^ but for the large light lattice. 



C-PBMC(120) 

C-PBMC(140) 

C-PBMC(160) 

A^md 

100 

100 

100 

Stopping condition “ 

10-14 

10-14 

10-14 

Trajectories 

1600 

1200 

1100 

{dH) 

0.197(18) 

0.243(43) 

0.194(20) 

BMC acceptance 

0.750(12) 

0.768(13) 

0.765(14) 

{dS) 

0.719(48) 

0.188(17) 

0.052(14) 

Correction acceptance 

0.563(18) 

0.770(12) 

0.886(19) 

Total acceptance 

0.422(14) 

0.597(15) 

0.678(17) 

Plaquette 

0.53417(11) 

0.53396(18) 

0.53411(15) 


“This is used for the generation of the pseudofermion field and the calculation of the correction factor. 
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TABLE VII: A comparison of the two- and (1 -|- l)-flavor QCD simulations at /3 = 5.0, 8^ x 16, 
K = 0.1415, Csw = 1.855. 



S-HMC 

Nj = 2 

C-PHMC(26) 

Nf = 2 

C-PHMC(26) 

Nf = 1 + 1 

A'md 

32 

32 

32 

Trajectories 

5000 

5000 

5000 

{dH) 

0.2634(107) 

0.2236(106) 

0.1262(70) 

HMC acceptance 

0.7172(79) 

0.7444(70) 

0.7994(78) 

{dS) (quark 1) 

- 

0.0553(59) 

0.0234(36) 

Correction acceptance (quark 1) 

- 

0.8595(53) 

0.9264(54) 

{dS) (quark 2) 

- 

- 

0.0167(37) 

Correction acceptance (quark 2) 

- 

- 

0.9370(58) 

Total acceptance 

0.7172(79) 

0.6398(117) 

0.6950(74) 

Plaquette 

0.43877(22) 

0.43839(27) 

0.43857(20) 


TABLE VIII: Simulation statistics with the Nf = 1 -|- 1 C-PHMC algorithm on large lattices. 


Large heavy Large light 

C-PHMC(80) C-PHMC(140) 



Nf = 1 + 1 

Nf = 1 + 1 

A'md 

80 

100 

Trajectories 

1000 

1500 

{dH) 

0.081(14) 

0.042(11) 

HMC acceptance 

0.829(14) 

0.872(14) 

{dS) (quark 1) 

0.014(7) 

0.042(10) 

Correction acceptance (quark 1) 

0.944(11) 

0.878(9) 

{dS) (quark 2) 

0.0084(61) 

0.047(10) 

Correction acceptance (quark 2) 

0.936(9) 

0.876(16) 

Total acceptance 

0.733(20) 

0.671(20) 

Plaquette 

0.52782(12) 

0.53392(9) 
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TABLE IX: Simulation parameters for a (2 + l)-flavor QCD simulation. j3 = 4.8,4^ x 8,Csw = 
1.00, Kud = 0.150, Ks = 0.140 are used. The stopping conditions are defined as follows; (a) the force 
calculation from the Nf = 2 pseudofermion action, (b) the calculation of the Hamiltonian of the 
Nf = 2 pseudofermion action, (c) the generation of the pseudofermion field, and the calculation of 
the correction factor for the single flavor part. 



Hybrid-R (extrapolated) 

C-PHMC(IO) 

C-PHMC(IO) 


Nf = 2 + 1 

Nf = 2 + 1 

Nf = 2 + 1 

Nud 

- 

20 

10 

Stopping condition (a) 

- 

10-14 

10-14 

Stopping condition (b) 

- 

10-14 

10-14 

Stopping condition (c) 

- 

10-14 

10-14 

{dH) 

- 

0.055(5) 

0.839(28) 

HMC acceptance ratio 

- 

0.877(7) 

0.521(12) 

{dS) 

- 

0.00014(61) 

0.00056(62) 

Correction acceptance ratio 

- 

0.9843(21) 

0.9861(22) 

Total acceptance ratio 

- 

0.864(7) 

0.514(12) 

Plaquette 

0.39702(13) 

0.39669(38) 

0.39695(32) 
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TABLE X: Simulation parameters for a (2 + l)-flavor QCD simulation. (3 = 5.0, 8^ x 16, Cgw = 
2.08, = 0.1338, Ats = 0.1330 are used. The definition of the stopping condition (a)-(c) is the 


same as those in Table IX. 



Hybrid-R (extrapolated) 

Nf = 2 + 1 

C-PHMC(58) 

Nf = 2 + 1 

A'md 

- 

32 

Stopping condition (a) 

- 

10-9 

Stopping condition (b) 

- 

10-14 

Stopping condition (c) 

- 

10-14 

{dH) 

- 

0.194(9) 

HMC acceptance ratio 

- 

0.743(7) 

{dS) 

- 

0.019(3) 

Correction acceptance ratio 

- 

0.926(5) 

Total acceptance ratio 

- 

0.688(8) 

Plaquette 

0.53161(7) 

0.53145(11) 
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FIG. 1: MD step size dependence of {dH) for two integration methods (UPU and PUP) in the 
MD evolution. The lines show the fit with {dH) = 7r(a • dt)^. 



dt 

FIG. 2: MD step size dependence of the acceptance for two integration methods {UPU and PUP) 
in the MD evolution. The lines show the function erfc[-y/i'(a • dt)^/2] with a obtained from Fig. |^. 


45 







FIG. 3: MD step size dependence of {dH) for preconditioned and unpreconditioned effective 
actions. The lines show the fit with {dH) = 7r(a • dt)^. 



dt 


FIG. 4: MD step size dependence of the acceptance for preconditioned and unpreconditioned 
effective actions. The dashed lines show the function erfc[y^(a • dt)^/2] with a obtained from 
Fig. I The dotted lines are approximations exp [—{a ■ dt)"^ — (a • dtY/2\. 
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FIG. 5: Efficiency Pace • dt. The lines show the function 
obtained in Fig. 
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FIG. 6: The violation of the reversibility as a function of the stopping condition on the large 
heavy lattice. 
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FIG. 7: 


Same as Fig. ^ but for the large light lattice. 
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FIG. 8: Npoiy dependence of {dH) on the small heavy lattice. {dH) with the S-HMG algorithm is 
also plotted on the most right side in the figure for comparison. 
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dt 


FIG. 9; {dH) versus dt with C-PHMC(26) and S-HMC algorithms. 
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FIG. 10: {dS) versus Npdy in the PHMG algorithm. The lines show a fit function {dS) = 
TTo^ exp{-2bNpoiy). 
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FIG. 11: Pcorr as a function of Npoiy. 
a and b are obtained from a fit in Fig. 
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FIG. 12: Efficiency of the noisy Metropolis test Pcorr/Pbpoiy See the text for details. 
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FIG. 13: Npoiy dependence of the residual HPNp^iyiDoolDooVo - ??o|/ho|)- 
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FIG. 14; Npoiy dependence of the reversibility violation on the large heavy lattice. 
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FIG. 15: 


Same as Fig. 14 but for the large light lattice. 
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FIG. 16: (dS) versus Np^iy for the large haevy (pentagons) and large light (diamonds) lattices. 
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FIG. 17; Pcorr versus Npoiy for the large haevy (pentagons) and large light (diamonds) lattices. 
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FIG. 18: Efficiency Pcorr/^poiy versus Pcorr on the large heavy (pentagons) and large light 

(diamonds) lattices. 
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FIG. 19: Total efficiency Ptota;/[A^Muit/traj] as a function of Ptotai- 
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FIG. 20: Npoiy dependence of the residual {\TNj„iy[Doo\TNpoiy[Doo]'no - ??o|/|??o|) {Nf = 1 + 1). 
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FIG. 21; Npoiy dependence of the reversibility violation on the large heavy lattice {Nf = 1 + 1). 
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FIG. 22: 


Same as Fig. 21 but for the large light lattice. 
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FIG. 23: Convergence behavior of the Taylor expansion of the correction matrix on the large 
heavy lattice. Open: Np^iy = 70; filled: Npoiy = 190. 
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FIG. 24: Same as Fig. ^ but for the large light lattice. Open: Npoiy = 100; filled: Npoiy = 200. 
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FIG. 25: MD step size dt dependence of the plaquette expectation value on the lattice of size 
4^ X 8 at /3 = 4.8, csw = 1-00, Hud = 0.150, Kg = 0.140. Open circles are results of the R algorithm, 
and the filled circles are from our exact algorithm. 
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FIG. 26: MD step size dt dependence of the plaquette expectation value on the lattice of size 
8^ X 16 at /3 = 5.0, csw = 2.08, Kud = 0.1338, Kg = 0.1330. Open circles are results of the R 
algorithm, and the filled circle is from our exact algorithm. 



n+v n+v n+v n+v 



n-v n-v n-v n-v 


FIG. 27: Diagrams contributing to F^{n) from the SW term. 










